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Speech Codecs 

Field of Invention 

The present invention relates to speech encoding in a communication system. 

5 

Background to the Invention 

Cellular communication networks are commonplace today. Cellular 
communication networks typically operate in accordance with a given standard or 
specification. For example, the standard or specification may define the 

10 communication protocols and/or parameters that shall be used for a connection. 
Examples of the different standards and/or specifications include, without limiting 
to these, GSM (Global System for Mobile communications), GSM/EDGE 
(Enhanced Data rates for GSM Evolution), AMPS (American Mobile Phone 
System), WCDMA (Wideband Code Division Multiple Access) or 3 rd generation 

15 (3G) UMTS (Universal Mobile Telecommunications System), IMT 2000 
(International Mobile Telecommunications 2000) and so on. 

In a cellular communication network, voice data is typically captured as an 
analogue signal, digitised in an analogue to digital (A/D) converter and then 

20 encoded before transmission over the wireless air interface between a user 
equipment, such as a mobile station, and a base station. The purpose of the 
encoding is to compress the digitised signal and transmit it over the air interface 
with the minimum amount of data whilst maintaining an acceptable signal quality 
level. This is particularly important as radio channel capacity over the wireless air 

25 interface is limited in a cellular communication network. The sampling and 
encoding techniques used are often referred to as speech encoding techniques or 
speech codecs. 

Often speech can be considered as bandlimited to between approximately 200Hz 
30 and 3400 Hz. The typical sampling rate used by a A/D converter to convert an 
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analogue speech signal into a digital signal is either 8kHz or 16kHz. The sampled 
digital signal is then encoded, usually on a frame by frame basis, resulting in a 
digital data stream with a bit rate that is determined by the speech codec used for 
encoding. The higher the bit rate, the more data is encoded, which resuits in a 
5 more accurate representation of the input speech frame. The encoded speech 
can then be decoded and passed through a digital to analogue (D/A) converter to 
recreate the original speech signal. 

An ideal speech codec will encode the speech with as few bits as possible 
10 thereby optimising channel capacity, while producing decoded speech that 
sounds as close to the original speech as possible. In practice there is usually a 
trade-off between the bit rate of the codec and the quality of the decoded speech. 

In today's cellular communication networks, speech encoding can be divided 
15 roughly into two categories: variable rate and fixed rate encoding. 

In variable rate encoding, a source based rate adaptation (SBRA) algorithm is 
used for classification of active speech. Speech of differing classes are encoded 
by different speech modes, each operating at a different rate. The speech modes 
20 are usually optimised for each speech class. An example of variable rate speech 
encoding is the enhanced variable rate speech codec (EVRC). 

In fixed rate speech encoding, voice activity detection (VAD) and discontinuous 
transmission (DTX) functionality is utilised, which classifies speech into active 
25 speech and silence periods. During detected silence periods, transmission is 
performed less frequently to save power and increase network capacity. "' For 
example, in GSM during active speech every speech frame, typically 20ms in 
duration, is transmitted, whereas during silence periods, only every eighth speech 




frame is transmitted. Typically, active speech is encoded at a fixed bit rate and 
silence periods with a lower bit rate. 

Multi-rate speech codecs, such as the adaptive multi-rate (AMR) codec and the 
5 adaptive multi-rate wideband (AMR-WB) codec were developed to include 
VAD/DTX functionality and are examples of fixed rate speech encoding. The bit 
rate of the speech encoding, also known as the codec mode, is based on factors 
such as the network capacity and radio channel conditions of the air interface. 

10 AMR was developed by the 3 rd Generation Partnership Project (3GPP) for 
GSM/EDGE and WCDMA communication networks. In addition, it has also been 
envisaged that AMR will be used in future packet switched networks. AMR is 
based on Algebraic Code Excited Linear Prediction (ACELP) coding. The AMR 
and AMR WB codecs consist of 8 and 9 active bit rates respectively and also 

15 include VAD/DTX functionality. The sampling rate in the AMR codec is 8 kHz. In 
the AMR WB codec the sampling rate is 16kHz. 

ACELP coding operates using a model of how the signal source is generated, and 
extracts from the signal the parameters of the model. More specifically, ACELP 

20 coding is based on a model of the human vocal system, where the throat and 
mouth are modelled as a linear filter and speech is generated by a periodic 
vibration of air exciting the filter. The speech is analysed on a frame by frame 
basis by the encoder and for each frame a set of parameters representing the 
modelled speech is generated and output by the encoder. The set of parameters 

25 may include excitation parameters and the coefficients for the filter as well as 
other parameters. The output from a speech encoder is often referred to as a 
parametric representation of the input speech signal. The set of parameters is 
then used by a suitably configured decoder to regenerate the input speech signal. 

Details of the AMR and AMR-WB codecs can be found in the 3GPP TS 26.090 
30 and 3GPP TS 26.190 technical specifications. Further details of the AMR-WB 
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codec and VAD can be found in the 3GPP TS 26.194 technical specification. Ail 
the above documents are incorporated herein by reference. 

Both AMR and AMR-WB codecs are multi rate codecs with independent codec 
5 modes or bit rates. In both the AMR and AMR-WB codecs, the mode selection is 
based on the network capacity and radio channel conditions. However, the 
codecs may also be operated using a variable rate scheme such as SBRA where 
the codec mode selection is further based on the speech class. The codec mode 
can then be selected independently for each analysed speech frame (at 20ms 
10 intervals) and may be dependent on the source signal characteristics, average 
target bit rate and supported set of codec modes. The network in which the 
codec is used may also limit the performance of SBRA. For example, in GSM, 
the codec mode can be changed only once every 40ms. 

15 By using SBRA, the average bit rate may be reduced without any noticeable 
degradation in the decoded speech quality. The advantage of lower average bit 
rate is lower transmission power and hence higher overall capacity of the network. 

Typical SBRA algorithms determine the speech class of the sampled speech 
20 signal based on speech characteristics. These speech classes may include low 
energy, transient, unvoiced and voice sequences. The subsequent speech 
encoding is dependent on the speech class. Therefore, the accuracy of the 
speech classification is important as it determines the speech encoding and 
associated encoding rate. In previously known systems, the speech class is 
25 determined before speech encoding begins. 

However, absolute speech quality degrades as a function of bit rate in a multi-rate 
speech codec. This is especially true when strong environmental background 
noise (for example car, street, cafeteria) is present during the call. This makes 




the operation of source based rate adaptation challenging, because when there is 
no active speech present (that is the callers are not talking), the codec is only 
coding background noise and will probably select quite low bit rate modes in order 
to save, system capacity. Users may hear the degradation even if it happens 
5 during non-active speech. For this reason, the AMR and AMR-WB codecs may 
utilise SBRA together with VAD/DTX functionality to lower the bit rate of the 
transmitted data during silence periods. During periods of normal speech, 
standard SBRA techniques are used to encode the data. During silence periods, 
VAD detects the silence and interrupts transmission (DTX) thereby reducing the 
10 overall bit rate of the transmission. In this case, background noise parameters are 
transmitted less often and then averaged in the receiving end to produce 
"comfort" noise, which sounds quite good. 

However, not all systems have DTX functionality, and therefore they have to code 
15 background noise using the normal speech codec modes. In these systems, 
when the bit rate decreases to a very low rate,: the speech codec starts to 
produce audible artefacts to the coded background noise, which are perceived as 
annoying at the receiving end. 

20 A paper published in the IEEE Workshop of 1999, authored by Hagen and 
Ekudden proposes a solution to this problem. In an existing ACELP speech 
coder, waveform matching LPAS structures are employed which provide high 
quality for speech signals, but have performance limitations for background noise. 
According to the paper authored by Hagen and Ekudden, a novel adaptive gain 

25 coding technique is used in the ACELP coder in which energy matching is used in 
combination with the traditional waveform matching criteria to provide high quality 
for both speech and background noise. The solution offered in that paper 
however requires a more complex coding to be implemented, which is 
implemented both across speech and across background noise. 
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It is an aim of the present invention to find a simpler solution to improve 
background noise. 

Summary of the Invention 

According to one aspect of the present invention there is provided a method of 
encoding speech in a communications system comprising the steps of: receiving 
a speech signal including voice signals and background signals; detecting voice 
activity and providing an indicator when no voice activity is detected; encoding the 
speech signal to generate a plurality of parameters representing the signal; and 
when said indicator is not present, outputting a first parametric representation of 
the speech signal comprising said plurality of parameters, and, when the indicator 
is present, modifying at least one of the parameters and outputting a second 
parametric representation of the speech signal including the modified parameter. 

According to another aspect of the invention there is provided a communications 
system arranged to encode speech, the system comprising: an input adapted to 
receive a speech signal including voice signals and background signals; a voice 
activity detector arranged to detect voice activity and to provide an indicator when 
no voice activity is detected; an encoder adapted to encode the speech signal to 
generate a plurality of parameters representing the signal; modifying circuitry 
operable when the indicator is present to modify at least one of the parameters; 
and an output at which a first parametric representation of the speech signal is 
output when the indicator is not present, the first parametric representation 
comprising said plurality of parameters, and at which a second parametric 
representation of the speech signal is output when the indicator is present, the 
second parametric representation including the modified parameter. 
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Brief Description of Drawings 

For a better understanding of the present invention reference will now be made by 
way of example only to the accompanying drawings, in which: 

5 

Figure 1 illustrates a communication network in which embodiments of the 
present invention can be applied; 

Figure 2 illustrates a block diagram of a prior art arrangement; 

Figure 3 illustrates a block diagram of an embodiment of the invention; and 

1 0 Figure 4 illustrates test results. 

Detailed description of embodiments 

The present invention is described herein with reference to particular examples. 
15 The invention is not, however, limited to such examples. 

Figure 1 illustrates a typical cellular telecommunication network 100 that supports 
an AMR speech codec. The network 100 comprises various network elements 
including a mobile station (MS) 101, a base transceiver station (BTS) 102 and a 
20 transcoder (TC) 103. The MS communicates with the BTS via the uplink radio 
channel 113 and the downlink radio channel 126. The BTS and TC communicate 
with each other via communication links 115 and 124. The BTS and TC form part 
of the core network. For a voice call originating from the MS, the MS receives 
speech signals 1 10 at a multi-rate speech encoder module 111. 

25 

In this example, the speech signals are digital speech signals converted from 
analogue speech signals by a suitably configured analogue to digital (A/D) 
converter (not shown). The multi-rate speech encoder module encodes the digital 
speech signal 110 into a speech encoded signal on a frame by frame basis, 



where the typical frame duration is 20ms. The speech encoded signal is then 
transmitted to a multi-rate channel encoder module 112. The multi-rate channel 
encoder module further encodes the speech encoded signal from the multi-rate 
speech encoder module. The purpose of the multi-rate channel encoder module 
5 is to provide coding for error detection and/or error correction purposes. The 
encoded signal from the multi-rate channel encoder is then transmitted across the 
uplink radio channel 113 to the BTS. The encoded signal is received at a multi- 
rate channel decoder module 114, which performs channel decoding on the 
received signal. The channel decoded signal is then transmitted across 
10 communication link 115 to the TC 103. In the TC 103, the channel decoded 
signal is passed into a multi-rate speech decoder module 116, which decodes the 
input signal and outputs a digital speech signal 117 corresponding to the input 
digital speech signal 110. 

15 A similar sequence of steps to that of a voice call originating from a MS to a TC 
occurs when a voice call originates from the core network side, such as from the 
TC via the BTS to the MS. When the voice calls starts from the TC, the speech 
signal 122 is directed towards a multi-rate speech encoder module 123, which 
encodes the digital speech signal 122. The speech encoded signal is transmitted 

20 from the TC to the BTS via communication link 124. At the BTS, it is received at 
a multi-rate channel encoder module 125. The multi-rate channel encoder 
module 125 further encodes the speech encoded signal from the multi-rate 
speech encoder module 123 for error detection and/or error correction purposes. 
The encoded signal from the multi-rate channel encoder module is transmitted 

25 across the downlink radio channel 126 to the MS. At the MS, the received signal 
is fed into a multi-rate channel decoder module 127 and then into a multi-rate 
speech decoder module 128, which perform channel decoding and speech 
decoding respectively. The output signal from the multi-rate speech decoder is a 
digital speech signal 129 corresponding to the input digital speech signal 122. 
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Link adaptation may also take place in the MS and BTS. Link adaptation selects 
the AMR multi-rate speech codec mode according to transmission channel 
conditions. If the transmission channel conditions are poor, the number of bits 
used for speech encoding can be decreased (lower bit rate) and the number of 
bits used for channel encoding can be increased to try and protect the transmitted 
information. However, if the transmission channel conditions are good, the 
number of bits used for channel encoding can be decreased and the number of 
bits used for speech encoding increased to give a better speech quality. 

The MS may comprise a link adaptation module 130, which takes data 140 from 
the downlink radio channel to determine a preferred downlink codec mode for 
encoding the speech on the downlink channel. The data 140 is fed into a 
downlink quality measurement module 131 of the link adaptation module 130, 
which calculates a quality indicator message for the downlink channel, Ql d . Ql d is 
transmitted from the downlink quality measurement module 131 to a mode 
request generator module 132 via connection 141. Based on Ql d , the mode 
request generator module 132 calculates a preferred codec mode for the 
downlink channel 126. The preferred codec mode is transmitted in the form of a 
codec mode request message for the downlink channel MR d to the multi-rate 
channel encoder 1 12 module via connection 142. The multi-rate channel encoder 
112 module transmits MR d through the uplink radio channel to the BTS. 

In the BTS, MRd may be transmitted via the multi-rate channel decoder module 
114 to a link adaptation module 133. Within the link adaptation module in the 
BTS, the codec mode request message for the downlink channel MR d is 
translated into a codec mode request message for the downlink channel MC d . 
This function may occur in the downlink mode control module 120 of the link 
adaptation module 133. The downlink mode control module transmits MC d via 
connection 146 to communications link 115 for transmission to the TC. 
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In the TC, MC d is transmitted to the multi-rate speech encoder module 123 via 
connection 147. The multi-rate speech encoder module 123 can then encode the 
incoming speech 122 with the codec mode defined by MC d . The encoded 
speech, encoded with the adapted codec mode defined by MC ds is transmitted to 
the BTS via connection 148 and onto the MS as described above. Furthermore, 
a codec mode indicator message for the downlink radio channel Ml d is transmitted 
via connection 149 from the multi-rate speech encoder module 123 to the BTS 
and onto the MS, where it is used in the decoding of the speech in the multi-rate 
speech decoder 1 27 at the MS. 

A similar sequence of steps to link adaptation for the downlink radio channel may 
also be utilised for link adaptation of the uplink radio channel. The link adaptation 
module 133 in the BTS may comprise an uplink quality measurement module 
118, which receives data from the uplink radio channel and determines a quality 
indicator message, Ql u , for the uplink radio channel. Ql u is transmitted from the 
uplink quality measurement module 118 to the uplink mode control module 119 
via connection 150. The uplink mode control module 119 receives Ql u together 
with network constraints from the network constraints module 121 and determines 
a preferred codec mode for the uplink encoding. The preferred codec mode is 
transmitted from the uplink control module 119 in the form of a codec mode 
command message for the uplink radio channel MC U to the multi-rate channel 
encoder module 125 via connection 151. The multi-rate channel encoder module 
125 transmits MC U together with the encoded speech signal over the downlink 
radio channel to the MS. 

In the MS, MC U is transmitted to the multi-rate channel decoder module 127 and 
then to the multi-rate speech encoder 111 via connection 153, where it is used to 
determine a codec mode for encoding the input speech signal 110. As with the 
speech encoding for the downlink radio channel, the multi-rate speech coder 
module for the uplink radio channel generates a codec mode indicator message 
for the uplink radio channel Ml u . Ml u is transmitted from the multi-rate speech 




encoder control module 111 to the multi-rate channel encoder module 112 via 
connection 154, which in turn transmits Ml u via the uplink radio channel to the 
BTS and then to the TC. Ml u is used at the TC in the muiti-rate speech decoder 
module 116 to decode the received encoded speech with a codec mode 
5 determined by Ml u . 

Figure 2 illustrates a block diagram of the multi-rate speech encoder module 111 
and 123 of Figure 1 in the prior art. The multi-rate speech encoder module 200 
may operate according to an AMR-WB codec and comprise a voice activity 

10 . detection (VAD) module 202, which is connected to both a source based rate 
adaptation (SBRA) algorithm module 203 and a discontinuous transmission (DTX) 
module 205. The VAD module receives a digital speech signal 201 and 
determines whether the signal comprises active speech or silence periods. 
During a silence period, the DTX module is activated and transmission interrupted 

15 for the duration of the silence period. During periods of active speech, the 
speech signal may be transmitted to the SBRA algorithm module. The SBRA 
algorithm module is controlled by the RDA module 204. The RDA module defines 
the used average bit rate in the network and sets the target average bit rate for 
the SBRA algorithm module. The SBRA algorithm module receives speech 

20 signals and determines a speech class for the speech signal based on its speech 
characteristics. The SBRA algorithm module is connected to a speech encoder 
206, which encodes the speech signal received from the SBRA algorithm module 
with a codec mode based on the speech class selected by the SBRA algorithm 
module. The speech encoder operates using Algebraic Code Excited Linear 

25 Prediction (ACELP) coding. 



The codec mode selection may depend on many factors. For example, low 
energy speech sequences may be classified and coded with a low bit rate codec 
mode without noticeable degradation in speech quality. On the other hand, 
30 during transient sequences, where the signal fluctuates, the speech quality can 
degrade rapidly if codec modes with lower bit rates are used. Coding of voiced 
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and unvoiced speech sequences may also be dependent on the frequency 
content of the sequence. For example, a low frequency speech sequence can be 
coded with a lower bit rate without speech quality degradation, whereas high 
frequency voice and noise-like, unvoiced sequences may need a higher bit rate 
representation. 

The speech encoder 206 in Figure 2 comprises a linear prediction coding (LPC) 
calculation module 207, a long term prediction (LTP) calculation module 208 and 
a fixed code book excitation module 209. The speech signal is processed by the 
LPC calculation module, LTP calculation module and fixed code book excitation 
module on a frame by frame basis, where each frame is typically 20ms long. The 
output of the speech encoder consists of a set of parameters representing the 
input speech signal. 

Specifically, the LPC calculation module 207 determines the LPC filter 
corresponding to the input speech frame by minimising the residual error of the 
speech frame. Once the LPC filter has been determined, it can be represented 
by a set of LPC filter coefficients for the filter. 

The LPC filter coefficients are quantized by the LPC calculation module before 
transmission. The main purpose of quantization is to code the LPC filter 
coefficients with as few bits as possible without introducing additional spectral 
distortion. Typically, LPC filter coefficients, {a 1r ..., ap}, are transformed into a 
different domain, before quantization. This is done because direct quantization of 
the LPC filter, specifically an infinite impulse response (MR) filter, coefficients may 
cause filter instability. Even slight errors in the MR filter coefficients can cause 
significant distortion throughout the spectrum of the speech signal. 

The LPC calculation module coverts the LPC filter coefficients into the immitance 
spectral pair (ISP) domain , before quantization. . However, the ISP domain 
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coefficients may be further converted into the immitance spectral frequency (ISF) 
domain before quantization. 

The LTP calculation module 208 calculates an LTP parameter from the LPC 
5 residua!. The LTP parameter is closely related to the fundamental frequency of 
the speech signal and is often referred to as a "pitch-lag" parameter, "pitch delay" 
parameter or "lag", which describes the periodicity of the speech signal in terms of 
speech samples. The pitch-delay parameter is calculated by using an adaptive 
codebook by the LTP calculation module. 

10 

A further parameter, the LTP gain is also calculated by the LTP calculation 
module and is closely related to the fundamental periodicity of the speech signal. 
The LTP gain is an important parameter used to give a natural representation of 
the speech. Voiced speech segments have especially strong long-term 
15 correlation. This correlation is due to the vibrations of the vocal cords, which 
usually have a pitch period in the range from 2 to 20 ms. 

The fixed codebook excitation module 209 calculates the excitation signal, which 
represents the input to the LPC filter. The excitation signal is a set of parameters 
20 represented by innovation vectors with a fixed codebook combined with the LTP 
parameter. In a fixed codebook, algebraic code is used to populate the 
innovation vectors. The innovation vector contains a small number of nonzero 
pulses with predefined interlaced sets of potential positions. The excitation signal 
is sometimes referred to as index to algebraic codebook. 

25 

The output from the speech encoder 210 in Figure 2 is an encoded speech signal 
represented by the parameters determined by the LPC calculation module, the 
LTP calculation module and the fixed code book excitation module, which include: 

1. LPC parameters quantised in ISP domain describing the spectral content of 
30 the speech signal (spectral parameters); 



14 

2. LTP parameters describing the periodic structure of the speech signal 
(including open-loop lag); 

3. ACELP excitation quantisation describing the residual signal after the linear 
predictors (residual vector); 

4. Signal gain. 

The bit rate of the codec mode used by the speech encoder may affect the 
parameters determined by the speech encoder. Specifically, the number of bits 
used to represent each parameter varies according to the bit rate used. The 
higher the bit rate, the more bits may be used to represent some or all of the 
parameters, which may result in a more accurate representation of the input 
speech signal. 

Figure 3 illustrates an embodiment of the present invention with a modified 
speech encoder 206'. In addition to the LPC calculation block 207, LTP 
calculation block 208 and fixed code book excitation block 209 of the prior art, the 
modified speech encoder 206' includes a number of respective smoothing blocks 
which are shown in dotted lines. The smoothing blocks act to modify parameters 
to have the effect of smoothing background noise in the parameterised signal. 
Although these are illustrated as separate blocks in the speech encoder, it will be 
understood that they will be implemented in practice as part of the module to 
which they belong, by appropriate software, firmware or hardware modifications to 
that module. Thus, there is a first smoothing module 210 associated with the 
LPC calculation module 207 which acts to modify the LSP vector for the current 
frame to generate a modified LSP vector LspNew which is transmitted from the 
speech encoder as part of the parametrical representation 210 in place of the 
unmodified LSP vector. 

In the LTP module both lag (pitch delay) and gain are produced. The first lag is 
calculated in open loop and then in closed loop around the open loop lag value. 
The open loop search for the lag gives a rough value, which is refined by the 
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closed loop calculation. The LTP gain is related to the LTP lag (pitch) value. The 
gain and lag parameters are denoted generally as lag parameters in Figure 3. 

A second smoothing module 211 is associated with the LTP calculation module 

5 908 fnr th<=* nnrnncp rvf mnHifvinn the nnpn-lnnn tan x/aln'R tn npnpntp. P modified 

gain parameter for transmission as part of the parametrical representation. A 
third smoothing module 212 is associated with the fixed code book excitation 
module 209 for the purpose of generating a modified residual vector NewRes for 
transmission as part of the parametrical representation 210. 

10 

The Vad module 202 which detects voice activity includes a flag 202a which 
indicates whether or not there is voice activity. If the Vad flag is set to zero, this 
indicates that there is no voice activity and this causes the smoothing modules 
210, 211 and 212 to become active. With the Vad flag set to one, i.e. when 
15 speech activity is detected, the smoothing modules 210, 211 and 212 do not 
operate, and the parametrical representation 210 is transmitted with the original 
parameters from the modules 207, 208 and 209 without smoothing or. 
modification. 

* 

20 As illustrated in Figure 3, the first smoothing module 210 is associated with a 
counter 213 which is named VadOfCountLspBuff in the following description. 
Similarly, the third smoothing module 212 is associated with a counter 214 which 
is labelled LspNoiseFact in the following description. 

25 A description of the operation of each of the smoothing modules 210, 211 and 
212 is given below. 
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Spectral parameters modification (LSP - module 210) 

VadOjfCountLspBuf is the counter 213, which is set to -1 , when VAD 
flag is set to zero. Otherwise the counter is updated as follows, based 
on a count of incoming frames. 



VadOjfCountLspBuf = \ 



[VadOffCountLspBuf < 5 VadOjfCountLspBuf + + 
[VadOffCountLspBuf > 5 VadOffCountLspBuf = 5 



If VAD flag is set to zero and VadOffCountLspBuf counter is greater 
than zero, the following modification is done for LSP vector LSP of 
the current frame. 

LspTemp = average(LspBuf (1)... LspBuf (VadOffCountLspBuf ')) 

T Lsp LspTemp* VadOffCountLspBuf 

sp ew ~ VadOffCountLspBuf + 1 VadOffCountLspBuf + \ 

LspBuf \s a buffer 215 including LSP vectors of last 5 frames. LspBuf 
is updated only when VAD flag is set to zero. LspBuf (1) is the LPC 
vector of last frame, LspBuf (2) is the LPC vector of second last 
frame, etc. LspTemp is the average of last frames depending on the 
count, VadOffCountLspBuf . LspNew is the average of current and 
past frames also depending on VadOffCountLspBuf and represents 
the smoothed vector which is transmitted as part of the parametrical 
representation 210. 

Open-loop LTP lag modification (module 211) 



If VAD flag is set to zero, the open-loop LTP lag parameter is 
randomised. Randomised open-loop LTP lag can get values from 20 
to 120 (samples in time domain). 



o 
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LP residual modification (Res - module 212) 

Res(0)is the residual vector of the current frame. Modified residual 
vsctcr cf the current frsins, .AfewRs o(Q) , is 02 ! cu ! 2tsd as fellows! 

New Re ^(O) = C * ((1 - Coef) * Re s(0) + Coef * Re sMax(0) * Rand Re s) 

5 where Rand Res is random vector including values between {-1...1}. 

ResMax(0) is the maximum absolute value of the current residual 
vector Re 5(0) . 

Coefls the noise contribution for the residual vector and it is 
increased in steps after VAD flag is set to zero as follows: 

1 0 Coef = IspNoiseFact * 0.0625 

where IspNoiseFact is the counter 214. The counter is set to 0, when 
voice activity detection flag is set to zero. Otherwise it is updated as 
follows, based. on a count of incoming frames. 



IspNoiseFact = 



{IspNoiseFact < 8 IspNoiseFact + + 
IspNoiseFact > 8 IspNoiseFact = 8 



15 Therefore Coef value will be 0.5 after 8 frames and then noise 

contribution will be 50% of the LP residual. C is the scaling factor 
which is calculated as follows: 



C = 



| Re sEnergyEst(0) 



NewRe sEnergy 

where New Re sEnergy is the energy of the modified residual vector. 
20 ResEnergyEst(0) is the residual energy estimate of the current frame 

and it is calculated as follows: 



ResEnergyEst(0) = 



\ 0.9*ResEnergyEst(-l) + 0A*ResEnergy(0) ,whenVAD = 0 
[0.66 * ResEnergyEst(-l) + 0.33 * ResEnergy(0) , when VAD = 1 



18 

where ResEnergyEst(-l) is the residual energy estimate of the last 
frame and ResEnergy(0) is the energy of residual vector Re^(O) of 
the current frame. 

A listening test was conducted with two experiments: car noise test with SNR 
lOdb and street noise test with SNR 20db. As can be seen from Figure 4, in both 
experiments the implementation of the smoothing function increased the overall 
speech quality. In fact, it was determined that by using the smoothing functions 
at 4.75 kbps, the speech quality could be improved to the level of AMR 12.2 kbps. 

In the above-described embodiment the randomised open loop LTP lag value is 
used to generate the modified gain parameter output as part of the second 
parametric representation of the speech signal. It will be appreciated however 
that that gain parameter itself could be modified by randomisation or in some 
other way. 
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CLAIMS: 

1 . A method of encoding speech in a communications system comprising the 
steps of: 

receiving a speech signal including voice signals and background signals; 

detecting voice activity and providing an indicator when no voice activity is 
detected; 

encoding the speech signal to generate a plurality of parameters 
representing the signal; and 

when said indicator is not present, outputting a first parametric 
representation of the speech signal comprising said plurality of parameters, and, 
when the indicator is present, modifying at least one of the parameters and' 
outputting a second parametric representation of the speech signal including the 
modified parameter. 

r 

2. A method according to claim 1, wherein the plurality of parameters 
includes a linear prediction calculation vector of quantised linear prediction filter 
coefficients. 

3. A method according to claim 1, in which the plurality of parameters 
includes a gain parameter based on open-loop lag value. 

4. A method according to claim 1, wherein the plurality of parameters 
includes a residual vector. 

5. A method according to claim 1, wherein the speech signal is received as a 
sequence of samples arranged in frames. 

6. A method according to claim 5, wherein the step of modifying at least one 
of the parameters includes smoothing the parameter for a current frame based on 
characteristics of the parameter in other frames of the signal. 
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7. A method according to claim 6, wherein said other frames include adjacent 
frames. 

8. A method according to claim 6, wherein the step of modifying at least one 
of the parameters includes producing a count of the number of received frames 
up to a predetermined maximum, and using said count in the modifying step. 

9. A method according to claim 1 , wherein the step of modifying at least one 
of the parameters includes generating a randomised value for the parameter. 

10. A method according to claim 1, wherein the step of modifying at least one 
of the parameters includes taking into account the energy levels associated with 
the parameter. 

11. A method according to claim 1, wherein the step of modifying at least one 
of the parameters includes modifying a value utilised in the generation of the 
parameter, whereby modification of that value produces a modified parameter 12. 

12. A method according to claim 11, wherein the step of modifying the value 
comprises randomising the value. 

13. A communications system arranged to encode speech, the system 
comprising: 

an input adapted to receive a speech signal including voice signals and 
background signals; 

a voice activity detector arranged to detect voice activity and to provide an 
indicator when no voice activity is detected; 

an encoder adapted to encode the speech signal to generate a plurality of 
parameters representing the signal; 

modifying circuitry operable when the indicator is present to modify at least 
one of the parameters; and 



21 

an output at which a first parametric representation of the speech signal is 
output when the indicator is not present, the first parametric representation 
comprising said plurality of parameters, and at which a second parametric 
representation of the speech signal is output when the indicator is present, the 
5 second parametric representation including the modified parameter. 

14. A communications system according to claim 13 when used in an 
environment in which the speech signal is received as a sequence of samples 
arranged in frames, wherein the modifying circuitry is arranged to smooth the 
parameter for a current frame based on characteristics of the parameter in other 

10 frames of the speech signal. 

15. A communications system according to claim 13 when used in an 
environment wherein the speech signal is received as a sequence of samples 
arranged in frames, wherein the modifying circuitry is arranged to produce a count 
of the number of received frames to a predetermined maximum, and to use said 

1 5 count in the step of modifying the parameter. 

i 

16. A communications system according to claim 13 wherein the modifying 
circuitry is operable to generate a randomised value for the parameter. 

17. A communications system according to claim 13 wherein the modifying 
circuitry is operable to take into account energy levels associated with the 

20 parameter. 
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